DOCUMENT FESUME 



ED 044 611 



AC 008 771 



AUTHOR 

TITLE 

INSTITUTION 

PUB DATE 
NOTE 

AVAILABLE FRCM 



Cain, Glen G. ; Hollister, Robinson G. 

The Methodology of Evaluating Social Action Programs. 
Wisconsin Univ. , Madison. Center for Studies in 
Vocational and Technical Education. 

Nov 69 

33p. ; Reprinted from Public-Private Manpower Policies 
Industrial Relations Research Institute, 431? Social 
Science Building, 1180 Observatory Drive, Madison, 
Wisconsin 53706 



EDRS PRICE EDRS Price ME-$0 . 2 5 HC-J1.75 

DESCRIPTORS *Action Research, Bibliographies, Cost 

Effectiveness, *?valuation Criteria, Manpower 
Development, *Program Evaluation, *Research 
Methodology, ^Social Action, Statistical Analysis 



ABSTRACT 

Focusing on the evaluation of outcomes ("cost 
benefit analysis") in large scale social action programs, this paper 
examines issues relating to the adequacy of theory and methodology as 
well as the impact of different types of persons (academics, 
politicians, program administrators) involved in the evaluation 
proct s. Problems of evaluation design — control groups, the criterion 
of replicability (applicability on a wider scale) , statistical 
analysis, and implications of differing socioeconomic criteria--are 
considered in detail. The authors then propose a deliberately 
experimental approach which would permit program planners and 
administrators to learn faster by testing alternative concepts 
simultaneously. Using an analogy with the court system, they also 
discuss the potential value of a "rules in evidence" approach to 
setting standards for acceptance of evaluation results. The document 
includes 1° footnotes, followed by reprints and other publications by 
the Industrial Relations Research Institute and the Center for 
Studies in Vocational and Technical Education. (LY) 






THE UNIVERSITY OF WISCONSIN 



U.S. DEPARTMENT OF HEALTH, EDUCATION 
Si WELFARE 
OFFICE OF EDUCATION 
THIS DOCUMENT HAS BEEN REPRODUCED 
EXACTLY AS RECEIVED FROM THE PERSON OR 
ORGANIZATION ORIGINATING VT. POINTS OP 
VIEW OR OPINIONS STATEO 00 NOT NECES- 
SARILY REPRESENT 0 -FJCIAL OFFICE OF EDU- 
CATION POSITION OR POLICY. 



CENTER FOR STUDIES 
IN VOCATIONAL AND 
TECHNICAL EDUCATION 



The Methodology of Evaluating 
Social Action Programs 

T-« 

vP 

• 4 “ 

* 4 " by Glen G. Cain and Robinson G. Hollister 



UJp 

*4 



eprinted from 

'ublic-Private Manpower Policies 
Industrial Relations Research Association 
November 1969 




INDUSTRIAL RELATIONS RESEARCH INSTITUTE 




5 V 



INDUSTRIAL RELATIONS RESEARCH INSTITUTE 
THE UNIVERSITY OF WISCONSIN 

The Industrial Relations Resejrch Institute fosters ano 
coordinates study and research in the field of industrial 
relations throughout the various schools and departments 
of The University of Wisconsin. In carrying out these 
activities, the University continues a tradition of more 
than half a century of leading work in this field. 

The Institute's programs of graduate study, leading to 
Master's and Ph.D. degrees in Industrial Relations, are 
closely associated with course and seminar offerings in 
the depart menu of economics, history, political science, 
psychology, and sociology In the College of Letters and 
Science, and with those In the College of Engineering 
and the Schools of Business, Education, and Lew. 

Faculty members participating in the research, teach- 
ing, and coordinating functions of the Institute are 
also drawn from these schools and departments. 

Current committee members Include: 

Executive Committee 

Chairman, James Stern, Economics; Director I.R.R.L 

Jack Barbash, Eco.tomlcs 

Leonard Berkowiti, Psychology 

Abner Brodie, Law 

Murray Edelman, Political Science 

Stephen C. Kleene (ex officio). Dean, L.<x$, 

Nathan P. Feinsinger, Law 
Alton Johnson, Business 
Richard U. Milter, Business; 

Associate Director, I.R.R.L 
Raymond Munts. Social Work 
Gerald Nadler, Industrial Engineering 
Robert Oianne, School for Workers 
Charles Perrow, Sociology 
Morton Rothsteln, History 
Gerald G. Somers, Economics 
George Strother, Vice Chancellor, University Extension 
Cart Schramm, Student 

Admissions Committee 



Curriculum Committee 

Chairman, David B. Johnson, Economics 

Charles Bridgman, Psychology 

Alan Filley, Business 

Donald Treiman, Sociology 

Roy Adams, Student 

Herbert G. Heneman, III, Student 

Sanford Sherman, Student 

John Baum, Student 

Publications Committee 

Chairman, Gerald G. Somers, Economics 

L. L. Cummings, Business 

ScottCutlip, Journalism 

Barbara Dennis, Journal of Human Resources 

William Glade, Buslnestand Economics 

George Hagglund, School for Workers 

Everett Kassalow, Economics 

Jack Ledinsky, Sociology 

Thomas Deco tils. Student 

Research Committee 

Chairman, Solomon B Levine, Economics and Business 

Andre De’beoq, Business 

Alan Gross, Psychology 

Richard Hamilton, Sociology 

James E. Jones, Jr., Law 

Hervty Juris, School for Workers 

Robert J. Lampmao, Economics 

David Zimmerman, Student 



Chairman, Richard U. Millar, Business 

Arten Christenson, Law 

David Gustafson, Engineering 

Solomon S. Levine, Economics end Businas* 

Eleanor t Roe, Lew 

Oonald Schwab, Business 

Lee Dyer, Student 



EDO 446 



Reprinted from PUBLIC-PRIVATE MANPOWER POLICIES 
Industrial Relations Research Association 

The Methodology of Evaluating 
Social Action Programs 

By Glen G. Cain * and Robinson G. Hollister * 

Apologia 

This paper is largely motivated by our experiences as academics 
who became directly enmeshed in the problems of a public agency 
which was under considerable pressure — generated by both the 
agency staff itself and external factors — to "evaluate” manpower, 
and other social actum, programs. 

It became evident that there were several major obstacles to 
effective evaluation in this context. These obstacles were created 
both by the several types of "actors” necessarily involved in such 
evaluation efforts and by complications and weaknesses in the 
theory and methodology to be applied. Difficulties of communica- 
tion among the “actors,” due both to differences in training and to 
suspicions about motives, often made it hard to distinguish between 
difficulties arising because the theory was weak and those arising 
because adequate theory was poorly understood. 

In this paper we try to separate out some of these issues, both 
those concerning the adequacy of theory and methodology and 

* This research was supported by funds granted to the Institute for 
Research on Poverty, University of Wisconsin, pursuant to the provisions of 
the Economic Opportunity Act of 11X14. Professor Cain and Professor Hollister 
are associated with the University of Wisconsin Department of Economics and 
are members of the Institute staff, The authors are grateful to the following 
persons, who have increased their understanding of the ideas in this paper or 
have commented directly on an earlier draft (or have done both) : David 
Bradford, Frank Cassell, John Evans, Woodrow Oinsburg, Thomas Olennan, 
Robert Levine, Guy Orcutt, Gerald Somers, Ernst Stromsdor/er, Harold Watts, 
Arnold Weber, Burton Weisbrod, and Walter Williams. A longer version of 
Uds paper is available as Discussion Paper 42-69 from the Institute for Re- 
search on Poverty, University of Wisconsin, Madison. An intermediate length 
version will appear in the volume consisting of the Proceedings of the North 
American Conference on Cost-Benefit Analyses, held in Madison, Wisconsin, 
May 14-16, 1969. 

6 






6 



PUBLIC-PRIVATE MANPOWER POLICIES 



those relating to the various sorts of actors. We have sought to 
couch the discussion in language that will make it available to 
academics, who we feel need a heightened awareness of the more 
practical difficulties of execution of evaluations in the social action 
context — and to public agency and political personnel, who we 
believe would benefit from increased sensitivity to the ways in 
which careful consideration of the design and careful control of 
evaluations can increase the power of the information derived 
from sueh efforts. The attempt to reach both audiences in one 
paper produces a mixture of elements bound to strike members 
of either audience as, at some points, extremely naive and, at 
others, disturbingly recondite. We can only hope that such reac- 
tions will be transformed into a resolve to initiate a more meaning- 
ful dialogue on these issues, a dialogue we feel is crucial to the 
development of an effective approach to evaluations of social action 
programs. 

Introduction 

This paper began as a discussion of methods of evaluating man- 
power programs — programs which used to consist almost entirely 
of vocational training and various but limited types of assistance 
for the worker searching for jobs within local labor markets. But 
with the recent emphasis on problems of poverty and the disad- 
vantaged worker, manpower programs have come to involve reme- 
dial and general education, to intermesh with community action 
programs providing a variety of welfare services, and, on a trial 
basis, to assist in migration between labor markets. They are part 
of a broader class of programs which, for lack of a better term, 
we might call social action programs. Onr paper will include many 
references to this broader class, and in particular to anti-poverty 
programs. In so doing, we hope to provide a more general and 
more relevant perspective on the topic of evaluation methodology. 

We hold the opinion, apparently widely shared, that existing 
evaluations of social action programs, (and we are including our 
own), have fallen short of meeting the standards possible within 







I 






METHODOLOGY OF EVALUATION 7 

the disciplines of the social sciences. The reasons for these short- 
comings are easy to identify. The programs typically involve 
investments in human beings, a relatively new area of empirical 
research in economics. They are aimed at such social and political 
goals as equality and election victories, as well as economic objec- 
tives concerning, say, income and employment. They often attempt 
to deliver services on a large enough scale to make a noticeable 
impact upon the community. And at the same time, they are 
expected to provide a quasi-experimental basis for determining 
what programs ought to be implemented and how they ought to 
be run. 

It is not surprising, then, that evaluations of social action 
programs have often not been attempted and when attempted, have 
not been successful. Despite this background, we believe that 
existing data and methods permit evaluations which, while not 
satisfying the methodological purists, can at least provide the 
rules of evidence for judging the degree to which programs have 
succeeded or failed. Specifically, the theme we will develop 5s 
that evaluations should be set up to provide the ingredients of 
an experimental situation: a model suitable for statistical testing, 
a wide range in the values of the variables representing the pro- 
gram inputs, and the judicious use of control groups. 

The paper reflects several backgrounds in which we have bad 
some experience— from economics, the tradition of benefit-cost 
analyses; from the other social sciences, the approach of quasi- 
experimental research; and from a governmental agency, the 
perspective of one initiating and using evaluation studies. Bach 
of these points of view has its own literature which we have by 
no means covered, but to which we are indebted. 1 

Types of Evaluation 

There are two broad types of evaluation. The first, which we 
call "process evaluation," la mainly administrative monitoring. Any 
program must he monitored (or evaluated) regarding the integ- 



8 



PUBLIC-PRIVATE MANPOWER POLICIES 



rity of its financial transactions and accounting system. There is 
also an obvious need to check on other managerial functions, 
including whether or not accurate records are being kept. In sum, 
“process evaluation” addresses the question: Given the existence 
of the program, is it being run honestly and administered effi- 
ciently t 

A second type of evaluation, and the one with which we are 
concerned, may be called “outcome evaluation," more familiarly 
known as “cost-benefit analysis." Although both the inputs and 
outcomes of the program require measurements, the toughest 
problem is deciding on and measuring the outcomes. With this 
type of evaluation the whole concept of the program is brought 
into question, and it is certainly possible that a project might be 
judged to be a success or a failure irrespective of how well it 
was being administered. 

A useful categorization of cost-benefit evaluations draws a dis- 
tinction between a priori analyses and ex post analyses. An 
example of a priori analysis is the cost-effectiveness studies of 
weapons systems conducted by tbe Defense Department, which have 
analyzed waT situations where there were no "real outcomes” and, 
thus, no ex post results with which to test the evaluation models. 
Similarly, most evaluations of water resource projects are confined 
to alternative proposals where the benefits and costs are estimated 
prior to the actual undertaking of tbe projects.* Only in the 
area of social action programs such as poverty, labor training, 
and to some extent housing, have substantial attempts been made 
to evaluate programs, not just in terms of before-the-fact esti- 
mates of probable outcomes or in terms of simulated hypothetical 
outcomes, but also on the basis of data actually gathered during 
or after the operation of the program. 

A priori cost-benefit analyses of social action programs can, 
of course, be useful in program planning and feasibility studies, 
but the real demand and challenge lies in ex post evaluations. This 
more stringent demand made of social action programs may say 



METHODOLOGY OF EVALUATION 



something about the degree of skepticism and lack of sympathy 
Congress (or "society”) has concerning these programs, but this 
posture appears to be one of the facts of political life. 

Problems of the Design of the Evaluation 2A 

A. The Use of Control Groups 

Given the objective of a social action program, the evaluative 
question is: “What difference did the program make?”, and this 
question should be taken literally. We want to know the difference 
between the behavior with the program and the behavior if there 
had been no program. To answer the question, some form of con- 
trol group is essential. We need a basis for comparison — some base 
group that performs the methodological function of a control 
group. Let us consider some alternatives. 

The Before -and- After Study. In the before and after study, 
the assumption is that each subject is his own control (or the 
aggregate is its own control) and that the behavior of the group 
before the program is a measure of performance that would have 
occurred if there had been no program. However, it is well known 
that there are many situations in which this assumption is not 
tenable. We might briefly cite some examples found in manpower 
programs. 

Sometimes the "before situation” is a point in time when the 
participants are at a particularly low state — lower, that is, than 
is normal for the group. The very fact of being eligible for par- 
ticipation in a poverty program may reflect transitory conditions. 
Under such conditions we should expect a “natural" regression 
toward their mean level of performance if we measure their status 
in an "after situation,” even if there were no program in the inter- 
vening period. Using aero earnings as the permanent measure 
of earnings of an unemployed person is an example of attributing 
normality to a transitory status. 

Another similar situation arises when young people are in- 
volved in the program. Ordinary maturation and the acquisition 



10 



PUBLIC-PRIVATE MANPOWER POLICIES 



t 

\ 



of experience over the passage of time would be expected to 
improve their wages and employment situation. 

There may be some structural change in the personal situa- 
tions of the participants before and after the program, which has 
nothing to do with the program but would vitiate any simple 
before-or-after comparison. We should not, for example, look upon 
the relatively high earnings record of coal miners or packinghouse 
workers as characteristic of their "before situation if, in fact, 
they have been permanently displaced from their jobs. 

As a final example of a situation in which the before-and-after 
comparison is invalid, there is the frequent occurrence of signifi- 
cant environmental changes- — particularly in labor market environ- 
ments — which are characterized by seasonal and cyclical fluctua- 
tions. Is it the program or the changed environment which has 
brought about the change in behavior t All of the above examples 
of invalidated evaluations could have been at least partially 
corrected if the control groups had been other similar persons 
who were in similar situations in the pre training period. 

Control Croups and Small Croup Studies. The particular 
strength of the small scale study is that it greatly facilitates the 
desideratum of random assignments to "treatment groups" and 
"control groups” or, at least, a closely supervised matching of 
treatment and control groups. Its particular shortcoming is that 
it is likely to lack representativeness— both in terms of the charac- 
teristics of the program participants and in terms of the character 
of the program. There is first the problem of a "hot house environ- 
ment” of the small group study. (See discussion of "replicability" 
below.) Second, a wide range of values of the program inputs 
(i.e., in terms of levels of a given treatment or in terms of quali- 
tatively different types of treatments) is less likely to be available 
in a small group study. Third, the small group study may not 
be able to detect the program's differential effects on different types 
of participants (e.g., by age, sex, color, residence, eto.,) either 
because the wide variety of participant types are not available or 



| 

I 




I 



o 

ERIC 






5 

J 




methodology of EVALUATION 11 

because their numbers are too small. Finally, it is both a strength 
and a weakness of the small scale study that it fs usually confined 
to a single geographic location. Thus, although "extraneous” noise 
from different environments are eliminated, we may le.\rn little or 
nothing about how the program would operate iu different 
environments. 

Control Groups and Large Group Studies. Tne largo scale 
study, which involves gathering data over a w:de range of en- 
vironments, customarily achieves "control” over the character- 
istics of participants and nonparticipants and over programs and 
environmental characteristics by statistical methods, rather than 
by randomization or careful matching, individual by individual. 
These studies have the capability of correcting each of the short- 
comings attributed to the small scale studies in the preceding 
paragraph. But because they are almost impossible to operate 
with randomization, the largo scale studies run afoul of the fa- 
miliar problem in which the selectivity of the participants may 
be associated with some unmeasured variable (s) which makes it 
impossible to determine what the net effect of the treatment is. 
Since this shortcoming is so serious in the minds of many analysts, 
particularly statisticians, and because the small scale studies 
have a longer history of usage and acceptability in sociology and 
psychology, it may be worthwhile to defend at greater length the 
large scale studies, which are more common to economists. 

Randomization is seldom attempted for reasons having to do 
with the attitudes of the administrators of a program, local pres- 
sures from the client population, or various logistic problems. 
Indeed, all these reasons may serve to botch an attempted randomi- 
zation procedure. Furthermore, we can say with greater certitude 
that the ideal "double-blind experiment with placebos” is almost 
impossible to achieve. If we are to do something other than 
abandon evaluation efforts in the face of these obstacles to ran- 
domization, we will have to turn to the large scale study and the 
statistical design issues that go along with it. 




■ V ivi #*^***3^^ Mfc 



I 




* 



PUBLIC-PRIVATE MANPOWER POLICIES 



12 



The fact that the programs vary across cities or among admin- 
istrators may be turned to our advantage by viewing these as 
“natural experiments ” 8 which may permit an extrapolation of 
the results of the treatment to the “zero" or “no-treatment" level. 
The analyst should work with the administrator in advance to 
design the program variability in ways which minimize the con- 
founding of results with environmental influences. Furthermore, 
ethical problems raised by deliberately excluding some persons 
from the presumed beneficial treatments are to some extend avoided 
by assignments to differing treatments (although, here again, 
randomization is the ideal way to make these assignments). 

It is difficult at this stage, to provide more than superficial 
observations regarding the choice between small and large-scale 
studies. It would seem that for those evaluations that have a 
design concept which is radically different from existing designs 
or where there is a quite narrow hypothesis which requires de- 
tailed examination, a small group study would be preferable. Con- 
versely, when the concept underlying a program is quite broad 
and where large amounts of resources are to be allocated, the large 
group approach is probably more relevant — a point argued in 
greater detail in our discussion of the “replicability criterion.” 

B. The Replicability Criterion 

A source of friction between administrators of programs and 
those doing evaluation research, usually academicians, is the failure 
to agree upon the level of decision-making for which the results 
of the evaluation are to be used. This failure, which is all the more 
serious because the issue is often not explicitly addressed, leads to 
disputes regarding two related issues — the scope of the evaluation 
study and the selection of variables to be studied. To deal with 
these disputes, we suggest applying the “replicability criterion.” 
We apply this name to the criterion because of the large number 
of cases in which evaluations of concepts have been made on the 
basis of projects which are not likely to be replicable on a large 










METHODOLOGY OF EVALUATION 13 

scale or which focus on characteristics of the project which are 
not within the ability of the decision-makers to control. To take an 
extreme example, it has sometimes been stated that the success 
of a compensatory education program depended upon the "warmth 
and enthusiasm” of the teachers. In the context of a nationwide 
program, no administrator has control over the level of "warmth 
and enthusiasm” of teachers. 

It is sometimes argued by administrators that evaluations 
which are based upon samples drawn from many centers of a pro- 
gram are not legitimate tests of the program concept since they 
do not adequately take into account the differences in the details 
of individual projects or of differentiated populations. These 
attitudes frequently lead the administrators or other champions 
of the program to select, either ex ante or ex post, particular 
“pet” projects for evaluations that "really count." In the extreme, 
this approach consists of looking at the successful programs (based 
on observations of ongoing or even completed programs) and then 
claiming that these are really the ones that should be the basis 
for the evaluation of the program as a whole. If these successful 
programs have worked with representative participants in repre- 
sentative surroundings and if the techniques used — including the 
quality of the administrative and operational personnel — can be 
replicated on a nationwide basis, then it makes sense to say that 
the evaluation of the particular program can stand for an evalua- 
tion of the overall program. But we can seldom assume these 
conditional statements. After all, each of the individual programs, 
a few political plums notwithstanding, was set up because someone 
thought it was worthwhile. Of course, some will flop because of poor 
teachers or because one or more operations were fouled up — but it 
is in the nature of the beast that some incompetent administrative 
and operational fool-vps will occur. A strength of summary, 
over-all measures of performance is that they will include "acci- 
dental" foul-ups with the "accidental” successes, the few bad 
administrators and teachers as well as the few charismatic leaders. 












j 



) 




3 

ERIC 



t 






14 



PUBLIC-PRIVATE MANPOWER POLICIES 



As a case in point, consider the success (according to prevailing 
opinion) of Reverend Sullivan's Operation Industrial Council in 
Philadelphia with the (as yet) absence of any evidence that the 
OIC idea has been successfully transferred elsewhere. 4 

Small scale studies of pre-selected particular programs are 
most useful either for assessing radically different program ideas 
or for providing the administrator with information relevant to 
decisions of program content within the confines of his overall 
program. These are important uses, but the decisions at a broader 
level which concern the allocation of resources among programs of 
widely differing concepts call for a different typo of evaluation 
with a focus on different variables. 

It may be helpful to cite an example of the way in which the 
replicability criterion should have been applied. A few years ago, 
a broad scale evaluation of the Work Experience Program 6 was 
carried out. (The evaluation was of necessity based upon very 
fragmentary data, but we are here concerned with the issues it 
raised rather than with its own merits.) The evaluation indicated 
that on the average the unemployment rates among the completers 
of the program were just as high as those with similar character- 
istics who had not been in the program. On the basis of this 
evaluation, it was argued that the concept of the program was 
faulty, and some rather major shifts in the design and in the 
allocation of resources to the program were advocated.® Other 
analysts objected to this rather drastic conclusion and argued that 
the “proper" evaluative procedure was to examine individual 
projects within the program, pick out those projects which had 
higher "success rates," and then attempt to determine which 
characteristics of these projects were related to those “success 
rates." 7 

The argument as to which approach is proper depends on the 
particular decision framework to which the evaluation results 
were to be applied. To the administrators of the program, it is 
really the project by project type of analysis which is relevant 



METHODO* OGY OF EVALUATION 



15 



to the decision variables which they control. The broader type of 
evaluation would be of interest, but their primary concern is to 
adjust the mix of program elements to obtain the best results 
within the given broad concept of the program. Even for program 
administrators, however, there will be elements and personnel 
peculiar to a given area or project that will not be replicable in 
other areas and other projects. 

For decision-makers at levels higher than the program adminis- 
trator the broader type of evaluation will provide the sort of 
information relevant to their decision frame. Their task is to 
allocate resources among programs based upon different broad 
concepts. Negative findings from the broader evaluation argue 
against increasing the allocation to the program, although a con- 
servative response might be to hold the line on the program while 
awaiting the more detailed project-by-project evaluation to deter- 
mine whether there is something salvageable in the concept em- 
bodied in the program. There will always be alternative programs 
serving the same population however, and the decision-maker is 
justified in shifting resources toward those programs which hold 
out the promise of better results. 

The basic point is that project-by-project evaluations are bound 
to turn up some “successful” project somewhere, but unless there 
is good evidence that that "success” can be broadly replicated and 
that the administrative controls are adequate to insure such repli- 
cation, then the individual project success is irrelevant. Resources 
must be allocated in light of evidence that concepts are not only 
“successful” on a priori grounds or in particular small-scale 
contexts but that they are in fact “successful” in large-scale 
implementation. 

C. The Theoretical Framework — Some Statistical 
Considerations. 

The main function of a theoretical framework in cost-benefit 
evaluations is to provide a statistical model suitable for testing. 






t 



16 



PUBLIC-PRIVATE MANPOWER POLICIES 



In this section a few brief remarks will be made about the statis* 
tical design of the evaluation— a lengthier discussion of these 
matters is taken up in another paper. 7A In thess remarks we will 
adopt the terminology of regression analysis, which is a statistical 
method flexible enough to handle an analysis of variance approach 
or that involved in simply working with cell values in tables. In 
the regression model, the dependent variable hi the objective of 
the social action program and the particular set of independent 
variables of most interest to us are those that describe or represent 
the program, or program inputs. In this discussion the inde- 
pendent variables will sometimes be referred to as "treatment 
variables.” 

It may be useful to divide the problems of statistical design 
into two categories: First, attaining acceptable levels of statistical 
significance on the measured effects of the treatment variables; 
second, measuring those effects without bias. We will not discuss 
the first problems here except to note that the failure to attain 
statistical significance of the effect of the treatment variable occurs 
either because of large unexplained variation in the dependent 
variable or small effects of treatment variables and these can be 
overcome with sufficiently large sample sizes. In our opinion, the 
most serious defect in evaluation studies is biases in the measures of 
effects of the treatment variables, and this error is unlikely to be 
removed by enlarging the sample size. 

One source of bias is inaccurate measures of the treatment 
variable, but a more pervasive and more serious problem is the 
presence of variables, not included in the statistical model, which 
are correlated with both the dependent variable and the treatment 
variable. Had the assignment to a program been made on a random 
basis, the laws of probability would have assured a low correlation 
(zero in the limit of a large enough sample size) between partici- 
pation in the program and these omitted variables. In the absence 
of randomization, we must fall back on statistical controls. At 
this point our theory and a priori information are crucially im- 



1 



S*V 

i • 



methodology of evaluation 



to ident f the --bles 
treatment variables and • , , th ® measured ejects of the 

variables may be objectivelvm DClUd6 w hem “ th ® “° del These 

« Previous Wrl tSe oT^v' ?*? *** * 
measure characteristics IJZ 7 - be BUCh difflc ^ 

personality." 8 D> motlVR tion, or an "appealing 

m Zr^ h zz’v7 are "* 

health etZ ZZlZ LT? T"* 9 - empIoyment -Perience 

h.ve nM8 ^ tzT^rzzvxr \r 

wsrr r;r ,he ~ * K 

have been collected!^ Bv^TI ,ha da * 

the availability of objective measures of to Z? *!"* " gardiD8; 
we do not have random »o • °* lmportant variables, if 

possibility that self-selZivZZT U ,"!• T* admit «”> 
program administrators ha!* intr pr0Mdum ol ‘he 
between the participants “ , ‘ ° d “ Md a ‘yetematie difference 

claim, as the p^rieS Cd^L* ““^pants. We do not 

all evaluations although ti, * * “ onrandom procedures invalidate 
have, but thTadv^mt of 7 T* whm nndonbtedly 

convince each other of its impc-tance ^“ re '* we *“ on| y 
men“ be 

with both he trearen °™ir. * WWCi *" 

be included in the l 0 dl. “ “ d ‘ he depend,H “ must 

bias. However “iron! *,““7 treato “* ^ without 
treatment ab ° 0 ‘ “* aff «‘ - 

menm, and due. thHST 1.Z t. ? "*“* “ *-*■ 

framework of the statistical model is thHerfdT* -1 1 4lda the 
treatments— that is vArieti™ *.• v the resi<3ual variation in 
■s, variation which remains after the entire set 



ERIC 






18 PUBLIC-PRIVATE MANPOWER POLICIES 

of independent variables is included, greater efficiency is obtained 
when the treatment variable is uncorrelated with the other inde- 
pendent variables. In the opposite extreme, if the treatment 
variables were perfectly correlated with some other variable or 
combination of variables, we would be unable to distinguish 
between which of the two sets of factors caused a change. It 
follows that even in the absence of randomization, designing the 
programs to be studied with as wide a range in levels and types 
of "treatments’’ as possible will serve to maximize the information 
we can extract from an ex post analysis. 

There are reasons in addition to those of statistical efficiency 
for planning for a wide range of values in the treatment of pro- 
grammatic variables. One is that social action programs have a 
tendency to change, rather frequently and radically, during the 
course of their operation. Evaluations designed to test a single 
type of program are rendered meaningless because the program- 
type perishes. But if the design covers a wider variety of pro- 
grams, then a built-in hedge against the effects of change is 
attained. Indeed, there is an even more fundamental reason why 
a wide range of inputs and program types should be planned for, 
and it is simply this: we seldom know enough about what will 
work in a social action program to justify putting our eggs in 
the single basket of one type of program. This evaluation model 
for a single type of project, sometimes described as the analogue of 
the "pilot plant," is not the appropriate model for social action 
programs given our current state of knowledge.® 

D. The Theoretical Framework — Some Economic 

Considerations. 

For operational purposes we will assume that the evaluation of 
each social action program can, at least in principle, be cast in the 
statistical model discussed in the previous section, complete with 
variables representing an objective of the program, treatment vari- 
ables representing the program inputs, control variables, and con- 



ERIC 



i 



METHODOLOGY OF EVALUATION 



19 



trol groups . 10 However, the substantive theoretical content of 
these models — the particular selection of variables and their func- 
tional form — must come from one or more of the traditional dis- 
ciplines suoh as educational psychology (e.g., for Head Start), 
demography (e.g., for a family planning program), medical science 
(e.g., for a neighborhood health center), economics (e.g., for a man- 
power training program), and so on. 

Sooner or later economics must enter all evaluations, since 
‘‘costing out" the programs and the setting of implicit or explicit 
dollar measures of the worth of a program are essential steps in a 
complete evaluation. In making the required cost-benefit analysis, 
the part of economic theory that applies is the investment theory 
of public finance economics, with its infusion of welfare economics. 
The function of investment theory is to make commensurable in- 
puts and outcomes of a social action program which are spaced 
over time. 10 * Welfare economics analyzes the distinctions between 
financial costs and real resource costs, between direct effects of a 
program and externalities, and between efficiency criteria and 
equity (or distributional) criteria. 

We will say very little on the last mentioned distributional or 
equity question of who pays and who reeieves, even though we 
strongly feel that accurate data on the distribution of benefits and 
costs is essential to an evaluation of social action programs. How- 
ever, the task of conducting a "conventional’’ benefit-cost analysis 
(where the criterion is allocative efficiency) is sufficiently complex 
that we believe it preferable to separate the distributional questions. 

Program, Inputs. In the investment theory model costs are 
attached to all inputs of a program and a single number emerges 
which measures the present value of the resources used. Most of 
the technical problems faced by the analysts on the input side are 
those of traditional cost accounting. We will confine our remarks 
to the two familiar and somewhat controversial problems of op- 
portunity costs and transfer payments, which arise in nearly every 
manpower program. Both of these problems are most effectively 








20 PUBLIC-PRIVATE MANPOWER POLICIES 

dealt with if one starts by asking: ‘What is the decision context 
for which these input measures are defined t 

The most general decision context — and the one to which eco* 
nomists most naturally refer — is that of the productivity of alter* 
native resources utilizations in society or the nation as a whole. 
In this case, one wishes to measure the cost of inputs in terms of 
the net reduction in value of alternative socially productive activi- 
ties caused by the use of the inputs in this particular activity. 
Now, the value of most inputs in terms of their alternative use 
will be more or less clearly indicated by their market price, but 
there are some inputs for which this will not be true. The most 
troublesome cases often concern the time of people. A well known 
example is the value of the time spent by students in school : since 
those over 14 or so could be in the job market, the social product 
(or national income) is less; therefore, an estimate is needed of 
what their earnings would be had they not been in school. (Such 
an estimate should reflect whatever amount of unemployment 
would be considered "normal.’’) For manpower programs the 
best evaluation design would provide a control group to measure 
the opportunity costs of the time spent by the trainees in the 
program. 

Sometimes the prices of inputs (market prices or prices fixed 
by the government) do not adequately reflect their marginal social 
productivity, and "corrected” or "shadow prices” are necessary. 
For example, the ostensible prices of leisure or of the housework of 
a wife are zero and obviously below their real price. By contrast 
a governmental fixed price of some surplus commodity is too high. 

The definition and treatment of transfer payments also depend 
on the decision context of the analysis. From the national perspec- 
tive money outlays from the budget of one program that are offset 
by reduced outlays elsewhere in society do not decrease the value 
of the social product When these outlays are in the form of cash 
payments or consumption goods, they are called transfer payments. 
An example is the provision of room and board for Job Corps 




i. 



i 



t 









K 1 









P :; / 

$ 



m 

3l v 
R-i' 

\r : 

%'i' 

jU 



l*:- 

it 

y.M 



methodoloov or EVALVATtoh , 



S^ 8 - 1 (,hcir ^ 

. ,hclr «> and board i( thev „„„ ,? W be meetin 8 the costs 
nsion of these services by the »r»„ 0t “ lhe P r °ffram, the pro. 

the valu « of alternative soeisHv r^ reflMlS °° ” el «*“*»£ in 

was paying these costs before wHl be^T*?. *' tivitieil - Whoever 
™" apand the money thus savlf „ *?'“ Ved ol that harden and 

are has been on actual increase in”*), 0 ? 8 °°^ 3 and aarvices. If 
the trainee or in the quality of his h? ™ ° f foad c »nsumed by 
be counted as a program 4ub-a e*, 0*1 • h<> “* tacre «» aan 
bc equal to the net increase in th* , ’ But m general > it would 
sumed— a benefit.” To summarize if th ° f * f °° d 8Dd housing con - 
being transferred from one indSdl mPUt C0sts are si “P b 
dmdna, or agency they in 

of this program or they are a cost Wa COSt of resources 

the benefit it yields to the recipient-J^ v immediate] y offset by 
aontezt is the general one ,ha * “a decision 

wrth no one member receiving any differ? , of "wiety. 

tion of benefits. y fferent weight in the calcula- 

In a narr °wer decision context th* * 
ao-na input costs counted in the bi^r^TT* may 'hift, 
the narrower one and vice versa On* «, ar ® Uot count ed in 
context— a favorite of people ^'t, “ Pld ° f e » arr <>w decision 

economists— is the vaguely deflaedrSulSV? re ?, Ugnant to most 
the decision context might be colitf ^ ? Udget Alt *matively 
viewpoint’* if the program narti • d ed that of the “taxpayers’ 
eluded from the group eon^der^M ti^ ‘ heir &mflies are «. 

‘ba only costs that are to be counL “ Payers ' In 'his content 
public budget. Some of the examnl* ^ that Come from the 
reversed. Presumably, «Z, ™ 

toa spent in school is of no intaTtfT t °* V ^ 

qualification's ?* pSbuTglV^ 

^^dentuouldpayifj-^r^t^ 



M 



IV 







3 

ERIC 







22 



PUBLIC-PRIVATE MANPOWER POLICIES 



ments for the cost of room and board to a Job Corpsman, which 
was considered a transfer payment above, would now be considered 
an input cost from the “taxpayer’s viewpoint." The fact that the 
trainee or his family is relieved of this burden would be of no in- 
terest since it would not be reflected in the public budget. However, 
if the costs of room and board had been met previously by a public 
welfare agency, then from the “taxpayer’s viewpoint," the costs 
would not be charged to the Job Corps program. 

It is not uncommon to see several decision contexts used in one 
analysis, and used inconsistently. For example, the post-training 
earnings improvement from participation in a Job Corps program 
are considered benefits. We all recognize, of course, that the earn- 
ings will be used mostly for consumption by the Job Corps gradu- 
ate. But in the same study, his consumption during training (room, 
meals, and spending allowance), is not viewed as conferring benefits 
to the corpsman . 12 Or is it that the benefits should not count be- 
cause while in training, he is not considered a member of “our 
society!" We leave this puzzle to those who prefer these restricted 
decision contexts. There are other such examples and still other 
and more narrow decision contexts, such as that of a local govern- 
ment or of the project by itself. But it is probably clear that our 
preference is for the national or total societal perspective. 

Program Outcomes. The problems of measurement on the out- 
come side of the evaluation problem are tougher to handle, and ex 
post evaluations of social action programs face particular problems 
because these outcomes are likely to involve behavioral relationships 
which are not well understood. It is particularly difficult to predict 
long run or permanent behavioral changes from the short run in- 
dicators revealed by the on-going or just completed program. 

The outcomes we wish to measure from many social action pro- 
grams occur months or years after the participants have completed 
the program. We can use proxy measures, which can themselves be 
measured during and soon after the program, but follow-up studies 
are clearly preferred and may in many cases be essential. A good 



METHODOLOGY OF EVALUATION 



23 



deal depends on the confidence we have in the power of our theories 
to link the proxies or short-run effects (e.g., test scoresi health 
treatments, employment experience in the short-run, etc.) with 
the longer run goals (longer run educational attainment, longevity, 
incomes, or all of these and perhaps other ‘‘softer" measures of 
"well-being”). It is a role for ‘‘basic research” in the social sciences 
to provide this type of theoretical-empirical information to evalua- 
tions, but we can also hope that the more thorough evaluation 
studies will contribute to our stock of “basic research findings. 

The major obstacle to follow-up measures is the difficulty in 
locating people, particularly those from disadvantaged populations 
who may be less responsive and who have irregular living patterns. 
The biases due to nonrespoue may be severe, since those partici- 
pants who are easiest to locate are likely to be the most “successful,” 
both because of their apparent stability and because those who have 
"failed” may well be less responsive to requests to reveal their cur- 
rent status. One way around the costly problem of tracking down 
respondents for earnings data is to use Social Security records for 
participant and control groups. The rights of confidentiality may 
be preserved by aggregating the data. 

Another problem in measuring outcomes, which also tends to 
be more talked about despairingly than coped with positively, is the 
category of external or third-party effects of the program. As a 
typical illustration consider a youth training program, which not 
only increases the earnings of the youths, but also reduces the in- 
cidence of crime among these groups, which generally benefits the 
community — e.g. less damage and lower costs of prevention and re- 
habilitation programs. Another source of third-party effects are 
those accruing to the participant’s family members, including those 
yet to be born. It is an open question, however, whether the prob- 
lem for concern is the lack of measurement of these external effects, 
or the tendency by administrators and others (particularly friends 
of the programs) to exaggerate their likely importance and to 






24 



PUBLIC-PRIVATE MANPOWER POLICIES 



r£'“ 

“tufeS ^ of s 

(«r^™ T- P ° Wer Pr ° gramS c “ K ™ •> samples, 
lb) Programs which augment the supply of workers i« „ „«.,*• 

2 r 0 rr‘:r wiu , ba ? .»• ** - ^ ic 

z:::^;z :l zz zx z r itk 

tiTr hm wHch wiu M " k * 

; ave little or no power to protect their economic interests 

:£rF?a;sa:=s5ss 

Z h Z Wt, ° > 6re reiected » «‘h«™ e e refused ad m Jon or fo 

couaderable separata influence on the positive and n^tl “effecta 
fflarit8°!r«tft, a i P0 ‘ ,lt br °” 8ht ° Ut in debales »*»Mhe relative 

Kf SSteZ? “ pa ^ n ‘ ~ 



METHODOLOGY OF EVALUATION 



25 



mmmsm 



NMfff 



vide for third-party individuals in the community. Thus, we are not 
proposing that the "community” he viewed as an “entity" separate 
from the individuals who comprise it. However, a separate focus 
on measures of co mmuni ty institutional changes appears necessary 
since the present state of our theories of community organization 
permit us little scope for anything except qualitative linkages be- 
tween institutional changes and their effects on individuals in the 
community. We can, for example, consider better communication 
between the neighborhood populace and the police, school officials, 
or the employment service as "good things,” either in their own 
right, as expressions of the democratic ethic, or because we believe 
that such changes will have tangible effects in safety, school achieve- 
ment or better jobs. 



Intentional Experiment*: A Suggested Strategy 

Underlying the growing interest in evaluations of social action 
pi Gurams is the enlightened idea that the scientific method can be 
applied to program experience to establish and measure particular 
cause and effect relationships which are amenable to change through 
the agents of public policy. However, traditional methods in science, 
whether the laboratory experimentation of the physical scientists, 
the testing of pilot models by engineers, or fie’d testing of drugs 
by medical scientists, are seldom models that can be directly copied, 
helpful though they are as standards of rigor. 

In particular, evaluation designs patterned after the testing of 
pilot models, which correspond to "demonstration projects" in the 
field of social action programs, have been inadequate for both 
theoretical and operational reasons. The present state of our theories 
of social behavior does not justify settling on a unique plan of 
action, and we cannot, almost by definition, learn much about alter- 
native courses of action from a single pilot project. It is somewhat 
paradoxical that on the operational level the pilot model has failed 
to give us much information because the design has frequently 
been impossible to control and has spun off in different directions. 



j, ***** *■•■■■* 






t 



I 




26 



PUBLIC-PRIVATE MANPOWER POLICIES 



The combination of, first, loose administration of and rapid 
changes in the operation of individual projects and second, a large 
scale program with many heterogeneous projects (different ad- 
ministrations, different environments, different clientele, etc. ) , has 
led to the interesting view that this heterogeneity creates what are, 
in effect, "natural experiments” for an evaluation design. For econ- 
omists, who are used to thinking of the measurement of consumers’ 
responses to changes in the price of wheat or investors’ responses 
to changes in the interest rate, the idea of "natural experiments” 
has a certain appeal. But what should be clear from this discus- 
sion — and others before us have reached the same conclusion — is 
that a greatly improved evaluation could be obtained if social action 
programs were initiated in intentional experiments. 

When one talks of "experiments” in the social sciences what 
inevitably comes to mind is a small scale, carefully controlled 
study, such as those traditionally employed in psychology. Thus, 
when one suggests that social action programs be initiated as inten- 
tional experiments, people imagine a process which would involve 
a series of small test projects, a period of delay while those pro- 
jects are completed and evaluated, and perhaps more retesting 
before any major program is mounted. This is very definitely not 
what we mean when we suggest social action programs as inten- 
tional experimentation. We would stress the word action to high- 
light the difference between what we suggest versus the traditional 
small scale experimentation. 

Social action programs are undertaken because there is a clearly 
perceived social problem that requires some form of amelioration. 
In general, (with the exception perhaps of the area of medicinal 
drugs were a counter tradition has been carefully or painfully 
built up), we are not willing ti postpone large scate attempts at 
amelioration of such pioblems until all the steps of a careful testing 
of hypotheses, development of pilot projects, etc. have been carried 
out. We would suggest that large scale ameliorative social action 
and intentional experimentation are not incompatible; experi- 



ERIC 



METHODOLOGY OF EVALUATION 



27 



mental designs can be built into a large scale social action program. 

If a commitment is made to a more frankly experimental social 
action program by decision-makers and administrators, then many 
of the objectives we have advocated can be addressed directly at 
the planning stage. If we begin a large national program with a 
frank awareness that we do not know which program concept is 
more likely to be most efficacious, then several program models 
could be selected for implementation in several areas, with enough 
variability in the key elements which make up the concepts to 
allow good measures of the differential responses to those elements. 
If social action programs are approached with an “intentionally ex- 
perimental’’ point of view, then the analytical powers of our sta- 
tistical models of evaluation can be greatly enhanced by attempts 
to insure that “confounding’’ effects are minimized — i.e., that pro- 
gram treatment variables are uncorrelated with participant char- 
acteristics and particular types of environments. 

A less technical but equally important gain from this approach 
to social action programs is the understanding on the part of ad- 
ministrators, decision-makers, and legislators that if we are to 
learn anything from experience it is necessary to hold the design 
of the program (that is, the designed project differentials in treat- 
ment variables) constant for a long enough period of time to allow 
for the "settling down" of the program and the collection and 
analysis of the data, A commitment to hold to design for a long 
enough period so that we could learn from experience is a central 
element in the experimental approach to social action. 

The idea that social action programs should be experimental 
is simple, but we cannot be sanguine about the speed with which 
the full implications of this simple idea will be accepted by de- 
cision-makers and the public as a whole. The view that programs 
can be large scale action programs and still be designed as inten- 
tional experiments has not been easy to get across, even to those 
trained in experimental methods in the social sciences, with its 
tradition of small scale research. 






t 



» 



28 



PUBLIC-PRIVATE MANPOWER POLICIES 



The emphasis on ex post evaluation is evidence of the fact that 
at some level legislators understand that social action programs are 
"testing” concepts. But it will require more explicit acceptance of 
the idea that some aspects of programs “tested'' in action will fail 
before the full advantages of the intentionally experimental ap- 
proach can be realized. Jt takes restraint to mount a program with 
a built-in experimental design and wait for it to mature before de- 
ciding on a single program concept, but we emphasize that restraint 
does not mean small scale or limited action. 

It is not unfair, we think, to characterize the approach to social 
action programs that has been taken in the past as one of serial 
experimentation through program failure. A program is built 
around a single concept, eventually it is realized that it does not 
work, so the program is scrapped (or allowed to fade away) and 
a new program and concept is tried. Certainly serial experimen- 
tation through failure is the hard way to learn. An intentionally 
experimental approach would allow us to learn faster by trying 
alternative concepts simultaneously and would make it more likely 
that we could determine not only that a particular concept failed, 
but also why it failed. 

The Acceptability of Evaluation Results 

It does little violence to the facts to state that few decisions 
about social action programs have been made on the basis of the 
types of evaluations we have been discussing thus far in this paper. 
A major reason for this, we feel, is an inadequate taste for rigor 
(or an overweening penchant for visceral judgments) by adminis- 
trators and legislators and excessive taste for the purely scientific 
standards by academics. It often seems that the scholars conspire 
with the legislators to beat down any attempt to bring to bear more 
orderly evidence about the effectiveness of alternative programs; it 
is not at all difficult to find experts who will testify that virtually 
any evaluation study Is not adequately "scientiuo" to provide a 
sound basis for making program decisions. There is a reasonable 



METHODOLOGY OF EVALUATION 



29 



and appropriate fear on the part of academies that sophisticated 
techniques of analysis will be used as deceptive wrapping around 
an essentially political kernel to mislead administrators or the 
public. This fear, however, often leads to the setting of standards 
of "proof’' which cannot, at present, given the state of the art of 
social sciences, or perhaps never, given the inherent nature of social 
action programs, be satisfied. The result generally is that the eva- 
luation is discredited, the information it provides ignored, and the 
decision-maker and legislator can resume the exercise of their vis- 
ceral talents. 

A first step toward creating a more favorable atmosphere for 
evaluation studies is to recognize that they will not be final arbiters 
of the worth of a program. A positive but more modest role for 
evaluation research was recently stated by Kenneth Arrow in a 
discussion of the relative virtues of the tradition processes of 
public decision-making (characterized as an adversary process) 
and the recently developed procedure of the Programming, Plan- 
ning, Budgeting System (characterized as a rationalistic or 
"synoptic process ”. 16 Arrow advocated an approach in between 
forensics and synoptics . 17 He illustrated his argument by making 
an analogy with the court system, suggesting that what was hap- 
pening through the introduction of the more rationalistic processes 
was the creation of a body of "rules of evidence.” The use of sys- 
tematic evaluation (along with the other elements of the PPBS) 
represents an attempt to raise the standards of what is admissible 
as evidence in a decision process that is inherently likely to remain 
adversary in nature. Higher standards of evaluation will lessen 
the role of "hearsay" testimony in the decision process, but they 
are not meant to provide a hard and fast decision rule in and of 
themselves. The public decision-making process is still a long way 
from the point at which the evidence from a hard evaluation is the 
primary or even the significant factor in the totality of factors 
which determine major decisions about programs. Therefore, the 
fear of many academics that poorly understood evaluations will ex- 



30 



PUBLIC-PRIVATE MANPOWER POLICIES 



ereise an inordinate influence on public decisions is, to say the 
least, extremely premature. But if standards for the acceptance 
of evaluation results are viewed in terms of the "rules of evidence” 
analogy, we can begin to move toward the judicious mix of rigor 
and pragmatism that is so badly needed in evaluation analyr*. 

The predominant view of the role of "serious,” independent eva- 
luations 18 (particularly in the eyes of harried administrators), 
seems to be that of a trial (to continue the analogy) aimed at find- 
ing a program guilty of failure. There is a sense in which this para- 
noid view of evaluation is correct. The statistical procedures used 
usually start with a null hypothesis of “no effect,” and the burden 
of the analysis is to provide evidence that is sufficiently strong to 
overturn the null hypothesis. As we have pointed out, however, 
problems of data, organization, and methods conspire to make clear- 
cut positive findings in evaluations difficult to demonstrate. 

The atmosphere for evaluations would be much healthier if the 
underlying stance were shifted from this old world juridical rule. 
Let the program be assumed innocent of failure until proven guilty 
through clear-cut negative findings. In more precise terms, we 
should try to avoid committing what are called in statistical theory 
Type II errors. Thus, an evaluation which does not permit rejecting 
the null hypothesis (of a zero effect of the program) at customary 
levels of statistical significance, may be consistent with a finding 
that a very large positive effect may be just as likely as a zero or 
negative effect . 18 "Buies of evidence" which emph&a'ze the avoid- 
ance of Type II errors are equivalent to an attitude which we have 
characterized as "innocent until proven guilty." (We must frankly 
admit that, like court rules of evidence, this basic stance may pro- 
vide incentives to the program administrators to provide data which 
are sufficient only for arriving at a "no conclusion" evaluative 
outcome.) 

As a final conciliatory comment; when we talk about evaluation 
studies leading to verdicts of "success" or "failure," it should be 
recognized that we are greatly simplifying and abbreviating the 



METHODOLOGY OP EVALUATION 



31 



typical results. Most social action programs are so complex in the 
variety of inputs and the multiplicity of objectives, that simple 
over-all judgments are not likely to lead to quick decisions to dump 
programs. In combination with more detailed studies, the purpose 
of the evidence provided by the analysts will instead usually be to 
suggest modifications in the program — to shift the composition of 
inputs, perhaps to re-emphasize some objectives and de-emphasize 
others — and to suggest marginal additions or subtractions in the 
total scale of the program. It is worth emphasizing these modest 
objectives because the trust and cooperation of program administra- 
tors are indespensable to an evaluation of the program. 



Footnotes 

1 Ai examples of the benefit-coit literature, km Robert Dorfman, ed,, KioiuKnp Bin#- 
/U# 0 / Oovirnmtnl lntitimtnl* (Brookings Institution, Washington, D.O., 1965), end 
A. R. Prett tod R. Turret "Oort-Bensfit Analysli: A 8 nrvey," Economic Journal, De- 
cember, I960, v. 70, pp. 088-735. Ai examples of the evaluation research literature, mo 
E dward A. 8uchmaD, Evaluation Research (Russell Sago Foundation, New York, 1907), 
Donald T. Campbell and JoUan 0. Stanley, Bxpirfinettiai and Quart-Experimental Dnifne 
far Research (Chicago, Rend-McNatty, I960), 0. H. Orcutt and A. O. Orcutt, "Incentive 
and Dlitincenttve Experimentation for Income Maintenance Policy Pnrpot*/’ American 
Bcenomfe Review, September, 1908, v. 08, pp. 704-79, and Harold Watts, "Graduated 
Work Incentives: Progress toward an Experiment in Negative Taxation,” Discussion 
Papers Series, Initltute for Research on Poverty, University of Wlaconaln, 1908. For 
examples of the point of view of officials of govern mental agencies, see William Gorham, 
"Notes of a Practitioner," and Elisabeth Drew, "HEW Grapples with PPB8," in FA# 
Public inttrnt, Summer, 1907, No. 0. 

■There doea seem to be a developing literature in which the a priori benefit-cost estt* 
mates are compared with tha ex post result! for water projects. See Maynard HufschmJdt, 
"‘Systematic Errors' in Cost Estimation la Poblic Investment,' to appear in the Uni* 
versities-Nationa] Bureau of Economic Research Conference volume, TA« Economics of 
Public Output. It may be that similar follow-up studies ere being undertaken for ds* 
feaio projects— h oas can at leait say that Congressional committees srs determined to 
csrry out their own foUow-np ivslustions on project! inch si tht TFX. 

t*h 1 mors extended version of this paper prepared for tha Initltute for Research 
on Potsrty, University of Wisconsin (Discussion Paper 49-09), we diicuu several prob- 
lems modeled with the specification of objectives of progrsms end how these affect evalua- 
tion designs 

• We are indebted to TOoaai K, Glennen, RAND Corporation, for his Ideal on this 
point 

4 firiedy, the 010 concept combines elements of training, Job development (often sided 
by pressure tactics against employers), and a psychological up-lifting of tha participants 
which is conducted with an ideology of mlUtancy end participatory democracy, 

1 The Work Experience program consisted of public employment of welfare recipients 
and other adult poor under *ntis V of the Economic Opportunity Act. Only minimal train* 
Ing was offered, but It was hoped that work-for-pay would, by Itself, provide a spring* 
board to seif -fu state tag employment In the private market. 









t 



l 



32 



PUBLIC-PRIVATE MANPOWER POLICIES 



*U, S. Congress, House Committee on Ways and Means, Community Work and 
Training Program. 90th Congress, 1st Sess., House Document No. 96 (Washington, D.O.: 
U. 6. Government Printing Office, 1967). 

? Worth Bateman, “Assessing Program Effectiveness,” Welfare in Review, Vol. 6, 
No. 1, January-Februairy 1968. 

74 See the version of this paper in the Discussion 8eries of the Institute for Research 
on Poverty. 

• An Important point to be remembered is that, for any given amount of resources 
available for an evaluation study, there is a trade off between an allocation of these 
resources for increased sample sire and allocation for Improved Quality of measurement, 
which might take the form of an expanded set of variables, improved measures of vari- 
ables, or reduced attrition from the sample. Too often we have witnessed a single-minded 
attachment to larger sample sizes, probably stemming from the analyst's fear that he 
wlU end up with "too few observations in the cells” of some only vagnely Imagined cross* 
tabulation. This fear should be balanced by an awareness both of the rapidity with 
which marginal gains in precision of estimates decline with increases fn “medium site" 
samples and of the extent to which a theoretically justified multiple regression model can 
overcome some of the limitations which cross-tabulation analysis impose on a given-sized 
sample. 

•See the vigorous defense of an experimental method in social action programs In: 
Guy H. Oreutt and Alice G, Orcutt, op. ctt. .... 

10 This assumption will strike some readers as too positivistic, too restrictive to 
“things measurable,” and too oblivious to the unmeasurable and subjective variables. Let 
us say In defense of this assumption only that It (a a "working assumption" that per- 
mits ue to discuss an Important region of evaluation which covers the measurable portion, 
that it Is desirable to expand this region and, therefore, to narrow the area left for sub- 
jective judgments, and that, In any case, the objective portion is necessary to an im- 
proved over-all judgment that spans both measurable and unmeasurable inputs and out- 
puts of a program, 

We bypass here the Important Question of the choice of a discount rste. 8ome dis- 
cussion of this Issue it provided in the Institute for Research on Poverty version of this 
paper. 

11 When the program produces an increase in consumption of goods and services, the 
treatment of these transfer payments can become more complicated if we do not assume 
that the goods and service have a value to the recipients equal to their cost. See A. A. 
Alchlan and W. R. Allen, UnivtrtUy Economic* (Wadsworth: Belmont, California, 1967, 
Second Edition) pp. 186-140 for an extended discussion. 

u For just one of many examples of IhU type of treatment of transfer payments 
see, "The Feasibility of Benefit-Cost Analysis In the War on Poverty: A Test Application 
to Manpower Programs,” prepared for the General Accounting Office, Resource Manage- 
ment Corporation, UR-054, December 18, 1668. * 

11 For a notable exception to the absence of attempted measurement of tfae type 
of third-party discussed above, see Thomas I. Ribich, Education and Poverty (Washing- 
ton, D.O.: The Brookings Institution, 1966). RfbfcVa study also gives us some evidence 
of the likelihood of relatively smatl Quantitative magnitudes of these effects. A rather free 
wheeling listing of third-party effects runs the risk of double counting benefits. For ex- 
ample, although other fatally members benefit from tbs better education or earnings of ths 
head of the household, we should not forget that had the investment expenditure been 
elsewhere, tren if In the form of an acroea-the-boatd tax cut, etAsr family heads would 
have had larger incomes, at least, with resulting benefit* to tfcrir families. In hit ex- 
amination of cost-benefit analysis of water resources development#, Roland N. McKean 
gives an extended discussion of the pitfalls of double counting. 8ee his BflcUncy in Oot~ 
smaMfti Through 8y*Um* AnatyfU (New York: John Wiiey and Sons, Ine^ 1958), 
especially Chapter 9. 

»An exceptionally good discussion of negative external effects, Including disruption 



METHODOLOGY OP EVALUATION 



33 



of the community structure, fa contained in Anthony Downs, "Uncompensated Non* 
Construction Costs Which Urban Highways and Urban Renewal Impose on Residential 
Households** which will appear in a UnlversiUesNational Bureau of Economic Research 
Conference volume entitled, Economics of Public Output. The literature on urban renewal 
and public housing is extensive and too well known to require listing here. 

a For an excellent discussion of many of these issues see Joel F. Handler, "Con- 
trolling Official Behavior in Welfare Administration/* The Law of tk$ Poor, ed., J. 
tenBroek (Chandler Publishing Co., 1966). (Also published in Th$ California Law Re- 
view, Vot. 64, 1966, p. 479.) 

*• For a more complete discussion of this terminology, see Henry Rowen, ‘'Recent 
Developments in the Measurement of Publfe Outputs,* 1 to be published in a Universities* 
National bureau of Economic Research Conference volume, The Economice of Public 
Output. 

,T Remarks by Kenneth Arrow during the NBER conference cited in the previous 
footnote. 

u We mean here to exclude the quick and casual sort of evaluations, mainly "In- 
house" evaluations, that more often than not are meant to provide a gloss of technical 
Justification for a program, 

lt Harold Watts has stressed this point in conversations with the authors. See Glen 
G. Cain and Harold W. Watts, "The Controversy about the Coleman Report: Comment,* 1 
Journal of Human Resources, Vol. Ill, No. 6, Summer, 1968, pp. 889*92, also, Harold 
W. Watts and David h. Horner, "The Educational Benefit* of Head 8tart: A Quantita- 
tive Analysis," Discussion Paper Series, The Institute for Research on Poverty, University 
of Wisconsin, Madison, Wisconsin. 



i 



i 



PUBLICATIONS OF THE 

INDUSTRIAL RELATIONS RESEARCH INSTITUTE 
ANO CENTER FOR STUDIES IN VOCATIONAL 
AND TECHNICAL EDUCATION 

State Labor and Social Legislation. A Symposium In 
Honor of Elizabeth Brerdeis Raushenbush, May 20, 

1966. $4.60 doth, $2.60 piper. 

t Archie Klelngartner. Professionalism and Salaried 
Worker Organization. 1967. 
t Kenneth McLennan. Managerial Skiff and Knowledge. 

1967. 

The Labor Movement: A Reexamination. A Conference 
In Honor of David J. Saposs, January 14 end 15, 
1966. $2.60. 

Bhal J. Bhatt. Labor Market Behavior of factory 
Workert In Bombay. 1 969. $4.50 doth, $3.00 paper. 

CONFERENCE PROCEEDINGS OF THE CENTER 

tOccupetbnat Data Requirements hr Education 
Hanning, June 16 and 16, 1966. 

Research to Vocational and Technical Education, 

June lOend 11, 1966. $3.00. 

Research to Apprenticeship Training, September 8 
end 9,1966. $3.00. 

1 Research Approaches fo the Initiation of New 

VxatioosLTbchnkal Programs, October 3-7, 1966. 

* The Encouragement of Research to Predominantly 
Negro Univervtiet, February 23, 1967. 

The Educe don and Paining of Racial Minorities, 

May I tend 12, 1967. $3.00. 

Education and Training In Correctional Institutions, 

1968. $3.00. 

PERIODICALS 

The Journal of Human Resources, published Quarterly. 
$8.00 per year for Individuals, $1 2.00 per year for 
Institutions (edd $0 i a year for foreign mailing}. 
Subscriptions available fr rm Journals Department 
The University of Wisconsin Press, P. 0. Box 1 379, 
Madison, Wisconsin 63701 . 

IRRI Report, published semiannually. 



CENTER REPRINTS* 

tW. Lee Hansen. ‘*Humen Capital Requirements for Edu- 
cational Expension: Teacher Shortages end Ifcecher 
Supply, ** Education end Economic Development (1966). 
t Witflem J. Swift end Burton A. Weisbrod, "On the Mone- 
tary YWui of Education's Intergeneration Effects," 
Journal of Political Economy (December 1966J. 
tW. Let Hansen. "Labor Force end Occupetionel Pro- 
jections,** Proceedings of the f$th Annual Matting 
of the Industrial Relations Research Association, 1968. 
f Burton A. WaNbtod. 'Investing in Humen Capital,** 

Journal of Human Resources (Summer 1966}. 



tGarald G. Somers. "Govern ment-Subsidized On-The-Job 
Training: Surveys of Employer** Attitudes," Hearings 
before the Subcommittee on Employment end Man- 
Ipower, 89th Conyess (September 1966; February 1966). 
t Burton A. Weisbrod. "Conceptual Issues tn Evaluating 
Training Programs," Monthly Labor Review (October 
1966). 

t Gerald G. Somers. "Retraining the Unemployed Older 
Worker," Technology, Manpower and Retirement 
Policy. ( 1 966). 

tGlen Cain, W. Lee Hansen, and Burton A. Weisbrod. 

"Occupational Classification: An Economic Approach," 
Monthly Labor Review (February 1967). 

Norman F. Dufty, "Apprenticeship-A Theoretical Model," 
British Journal of Industrial Relations, {March 1967). 
tJ. K. Little. "The Occupations of Non-College Youth," 
American Educational Research Journal, (March 1967). 
tGlen Cain and Gerald Somers. "Retraining the Disadvan- 
taged Worker," Research to Vocational and Technical 
Education, Proceedings of a Conference, 1966. 

Gerald G. Somers. "Our Experience With Retraining and 
Relocation," Toward a Manpower Policy { 1 067 ) . 
f Jac* Bar bash. "Union Interests In Apprenticeship end 
Other training Forms," J&tmat of Human Resources 
{Winter 1968). 

Gerald G. Somers. "The Response of Vocational 
Education to Labor Market Changes," Journal of 
Human Resources, Supplement (1968). 
tWaltar Fogel. "Labor Market Obstacles to Minority 

Job Gains," Proceedings of the 20th Annual Winter Meet- 
ing of the Industrial Relations Research Association, 1 967. 

Gerald G. Somers and Graeme H. McKechnfe. "Vocational 
Retraining Programs for the Unemployed/' Proceedings 
of the 20th Annual Winter Meeting of the Industrial 
Relations Research Association, 1967. 

Burton A. Weisbrod and Peter Kerpoff. 'Monetary Returns 
to College Education, Student Ability, and College 
Quality," Review of Economics and Statistics 
(November 1968). 

David B. Johnson and James L. Stern. "Why and How 
Work art Shift From 8lu*Collaf to White-Collar 
Jobe," Monthly Labor Review, (October 1969). 

Philip A. Perron# and Donald H. Johnson. "The Marginal 
Worker: Projections of Hid* School Vocational 
Tkechers," Journal of Human Resources (Fall 1968). 

Glen G. Cain and Robinson G. Hollister. "The Methodology 
of Evaluating Social Action P oyama," PubfcPrivate 
Manpower Poffcka, Industrial Relations Research 
Association (November 1969). 



(Out of Print 

*$k>de copies a. alia Wa without charge 



I Ml AEJftim 1M7 TO THE PRESENT 






(Ra(rV*» prV* to 1M7 art not Mad. A aorrpM* **)«■* o» VAN 
n*r(nti my ba obtolnad Ay wM* to A* I Ml. 4311 Soda) fctooa 
MWn*. 1 1 >0 O b wrwtonr OH*. Madton. Pbconaln UTOtj 



•0 ****■— r* * — i-*- *^i- t T J -r‘Tfr , - J * ri-'it iVrtViri run ifi 

(January IM7). 

Ill Jamaa fearn, "^gilfwt to TW* Ooair*." iton&y Utw 
Arab* (January IM7J. 

t « ***** U Ml**. "AtVlrAtk* * Hn OanfracS Wap 

Soma Meant land*," KUrVvVUor XAUm Arvfcw 
IWy 1B67). 

f 13 Gian Cain. "Unamptoymant and tf* Labor -fore* N n t dp aflow od 
Sacondar* Renin" AtoLobbf an/ tabor Aabctanr AaWna 
(January 1M7). 

t 14 Alan C. E*ay. “MaA'd Sdanca Abortion* to Wanafnant: 

An A*nHtol.- $u *hmhnpK*m (Wtotv 19M. 

H IvaradM. fcaaMow. "NatxmH Wap MitcM LaaonatoOaia. 

Etropa and Jta UAA»" Anxmdhft bbtlM AvtutfMr* 
«MNifa/Aa*to(arbf Aatotfoni *■ i ril > Ajaadbtfon, 1AM 
IN GaraUG toman. ~Tfc* ftkft.Tha Poor, and T*aO*ar\" 

771* C7*i*frv A/nar*an towif |1M7). 

1W W. La* IVm “Tha Economic! of Sdantrfk and EnfinaaHn* 
tfenpoaw" Jbumrf a# AArm Aatturt* {Sprint 1*67) 
tM Jadi Bar t adK nwhnotopr and labor In AtVantlat S CanAry," 
ftcAnoibdr A» DMM CAiliatfa* (1M7). 
t M W. Laa Manaan and M» A. Vto-drod "Economfei of toa 

A»t*y tVaft," Outrmff Auraaf ad Ccanomfcs (Au»* 1*E7) . 

* *0 JacA IrlaA "Th* knrcfcsi and E«c*utfon of Urban Intvaati 
In Awafana.* Joint Economic Cemmfctaa. Subcomndnat an 
fbcal h^OtdA^kmmAmrnnAnfYfOtcm^bm ’**7). 
til tto\« II. f md m . nbda Uhbniaw and to O a * b*nani Proca* 

In »* Haw Hadeew A CompxSa V*a." to* natter * Lto<y (IM7). 
I *7 JatA Bartarfi. "Jebo A. Common* and *» Aff*fcan U* don of iho 
Itear Pi attorn." Jmemt * foanam* teuaa {Apton** !M7\. 

IM Nor— nP.Oy*y.‘*uaC»t^ OrW aa tol i WW 

9* Cm mpnth m S oc k **. 

H R**d U Mate. Tabor l*#terte and Marie* tt**a*M 
M*A a todbtorte AaMfcnr ir*n*y »NA- 
t« Oarirf PtePn. "todutorte labor *acndte*a * hm. m 



I ft ftaap HAa. 'TH If aA u toaadl td h N ted tetencoTteory 



IMtaabi,' 



r IM7). 

1 f7 Parted taw* and baton M NM a A *T1a Oaiteop m a n t of Hwaa 
H—iw." fad AM Pmtmm hr §m ftnte M — 
lato tR tetot laanaaa lc fbwwdaaa.U.1 Canp** (IMA. 

At U6 *v.- Lte* MPtory. % 9v tm*t (VW IMA 
IN Jto* MW. "Tte II CWU ■»(>**»** R * 
OuUnAf.' Labor ftbtey. fctebtenarl (SarVa IMA. 

ttt (a«vEW*QtoMaM.hA< 

A oN » Ub#JWab*aMrAtMA. 
ltd MtfOo^n'UwVtotohbi 

Indian.- *M* JhwM 0 f MUM tatafea AM IMA. 
1W ^iMabTttordbra^toOANMAn AM' 
Awa nfMAUianWA. 

»« tArwi NaaatoA AMdW to ad <w M M W Mrt a m 

bab^n, 1Mj>- bA aa V WLAar Ad **« *tofr»UA» >MA 
tIM A C Eton HAl Abaa^ Baaaa af Pmo to OaritoM 

Rto fi M u ,- 1 -a Aay a/Anopaoaw A to ra aA^ ft a a i A tM*. 
t«l MAAA^ M oAiIIM M MaNmINAH 



*«• Nna A *itodto» Do CwmiM tAato Ni* MM> 
Al lMMM MMAIAdMoP AMo WI 
tM> Aw V A*»to* A A bMbb ftbMli l d M A iM l Aii ai i nA" 



114 Aobari t.Uwyman. Tto Inracrrant A Sodai Sacurfry Ra w rr w 
and O aralop m an t ProWarw: Tha PhiTIppinai aa a Cm Htatory." 
Tfm Mok tf Soc& Security to f conondr O mm / a pim N . RaaartA 
Raport No 77. 

Ill Grrid G. Somarv ‘‘Data NaaAi lor Mentoring and EvatuaUnf 
Wanpoartr Profarw." fmcmdhp of ** it* Annual nv m 
o* ** *0mx* Am-c*tio^ IMA. 

Ill Eraraw It. tmdm . tbfc Errytoyaa IfattM A, l^opo: 
NNl Loanna terlw Unltsd StstaaT hocmdtnpot** 21* 
Armtt Molar A totAy of dtofadbarld Aabtfent I toaa U I 
AaocMdOBM 

117 W. Laa Maraan and lorton A Wahbrod. “TJo [>iatr#vtlon of 
Coat* and Oinaa lanafto of PuWc Hi/* Education. Tha Can 
of CNttomta,*' Aumaf a< Human Raamacat {Sprin* IMA- 

11* Grald G Somart. '*8aryainin| foaar and InddaWi Marionc 
THaarx - f*a^ h fcxAraV IdMbrto CA. 1 

HI Ja<ABarWL n Rjniora«iatiOThdwA(WicanUdorv M r^ 
b todbarW Wadena, Ou H. 

120 Hurray f ddmart *Th> ConarvtKo PoWcal Cbnaaquan c aa of 
Ijtoor Oonflicta,* Cmyt Ja fcdteAM RabtAaa. O*. 1 1. 

121 ttanlry Aoar and Alton C. Jbfvaon "Artormanc* AppraW: 
AMra Art Pa HaadadT* T)m Ana iMdAdnMtoitor 
{Saptand* -Octet* IMA- 

122 Raymond Ibinta. *Tha Na*> E adW a * lflAaii |by« WI Inaur- 
anca: OfcjactWa* lor Itday*! loonomy." Haven** lAupuat IMS). 

122 fcardrr Aoan. "Oam oua cy A a Awtdfc Empl o y* Union. 1 * 

Abb: Vaorv* Aa*4rar fOctobai IMA. 

IM 0^0*pdW^joGettoPoA«tV7«a.* , tetor*clor* 
Awmaf 0 f Cbmpa-arSa bxAA* {Mon* and Juna IMA- 




111 Nr*, /T1X *vUL 01* 11^ D a>AA aro,‘ toAo*^ 
A rabia May IMA 

111 CtopHn ^ dTAW i P W t^tonVtllaiMA 
*T W ^fcdM» MfaAto^ 

Aw* and ^MMaa Oanp b t« PMIpphp” «nd Tha 
fc***IMltoE*v)A AMAAAjAMMdMA" 
A b* M> i /**to* A bon d 






^ «<= 



* 



